Gini Diversity Index, Hamming Distance and Curse of Dimensionality

نویسنده

  • PRANAB K. SEN
چکیده

The celebrated Gini(-Simpson) biodiversity index has found very useful applications in ecology, bio-environmetrics, econometry, psychometry, genetics, and lately in bioinformatics as well. In such applications, mostly, categorical data models, without possibly an ordering of the categories, crop up, which may preempt routine use of conventional measures of quantitative diversity analysis. Further, in real life problems, mostly, genuine multidimensional data models are encountered. The Hamming distance incorporates the idea of Gini-Simpson diversity index in a variety of multidimensional setups, without making very stringent structural regularity assumptions. In bioinformatics as well as many other large biological system analysis studies, the curse of dimensionality (arising in multidimensional purely qualitative categorical data models) is a geneuine concern. The role of Hamming distance based analysis is appraised in this context. Subgroup or MANOVA decomposability aspects are specially appraised in this setup.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Binarising SIFT-Descriptors to Reduce the Curse of Dimensionality in Histogram-Based Object Recognition

It is shown that distance computations between SIFT-descriptors using the Euclidean distance suffer from the curse of dimensionality. The search for exact matches is less affected than the generalisation of image patterns, e.g. by clustering methods. Experimental results indicate that for the case of generalisation, the Hamming distance on binarised SIFTdescriptors is a much better choice. It i...

متن کامل

Indexing Reduced Dimensionality Spaces Using Single DimensionalIndexesHeng

The dimensionality curse has greatly aaected the scalability of high-dimensional indexes. A well known approach to improving the indexing performance is dimensionality reduction before indexing the data in the reduced-dimensionality space. However, the reduction may cause loss of distance information when the data set is not globally correlated. To reduce loss of information and degradation of ...

متن کامل

On the benefits of output sparsity for multi-label classification

The multi-label classification framework, where each observation can be associated with a set of labels, has generated a tremendous amount of attention over recent years. The modern multi-label problems are typically large-scale in terms of number of observations, features and labels, and the amount of labels can even be comparable with the amount of observations. In this context, different rem...

متن کامل

GPU Accelerated Self-join for the Distance Similarity Metric

The self-join finds all objects in a dataset within a threshold of each other defined by a similarity metric. As such, the self-join is a building block for the field of databases and data mining, and is employed in Big Data applications. In this paper, we advance a GPU-efficient algorithm for the similarity self-join that uses the Euclidean distance metric. The search-and-refine strategy is an...

متن کامل

Curse of Dimensionality in the Application of Pivot-based Indexes to the Similarity Search Problem

In this work we study the validity of the so-called curse of dimensionality for indexing of databases for similarity search. We perform an asymptotic analysis, with a test model based on a sequence of metric spaces (Ωd) from which we pick datasets Xd in an i.i.d. fashion. We call the subscript d the dimension of the space Ωd (e.g. for R d the dimension is just the usual one) and we allow the si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005